Building Blocks

StartR Workshop

Maik Bieleke, PhD

University of Konstanz

November 23, 2024

R

What is R?

R is a programming language and software environment for statistical computing and graphics. Some of its main advantages are:

  • free and open source software
  • easy sharing of data and code for reproducible research
  • collaborative project with many contributors and resources
  • extendible with tens of thousands of packages
  • state-of-the-art data analysis and visualization tools
  • cross-platform software (Windows, Mac Os, UNIX/Linux)

Creators of the R programming language

(a) Robert Gentleman

 

(b) Ross Ihaka
Figure 1: R has been developed in 1992 by two statisticians at the University of Auckland, New Zealand.

R as standalone program

Locating R.exe on a windows computer

Running R.exe

Graphical user interface (Gui)

Locating RGui.exe on a windows computer

Running RGui.exe

Integrated Development Environment (IDE)

RStudio

What is RStudio?

RStudio is an integrated development environment (IDE) for R and Python.

  • developed and maintained by Posit PBC
  • free software with optional commercial extensions
  • tools for editing, debugging, visualization, and publishing
  • workflows for data projects and reproducible research
  • cross-platform software (Windows, Mac, Linux)

RStudio panes

Console

What is the console pane?

The console is R within RStudio. It provides the area to interactively execute code and get an immediate response. It’s typically used only for quick calculations that one doesn’t want to save.

  • The > signals that R is waiting for your input.
  • The + signals that R is waiting for you to complete your input.

Press enter to submit your input or escape to cancel it.

R as calculator

The console is ideally suited for quick calculations that are not saved.

1 + 1   # addition
[1] 2
4 - 3   # subtraction
[1] 1
2 * 5   # multiplication
[1] 10
12 / 3  # division
[1] 4
2^2     # exponentiation
[1] 4
sqrt(9) # functions (square root)
[1] 3
pi      # constants
[1] 3.141593

Saving results in objects

Assignments are used in R to store information in objects.

  • the operator for assignments is <-

  • the notation for an assignment is object <- ...

    a <- 5
  • to see the information stored in an object, call it by name

    a
    [1] 5

Tip

The shortut for the assignment operator is Alt + - (Windows) or Option + - (Mac).

Naming objects

R is case sensitive, so a and A would refer to different objects.

  • objects names can consist of letters (a-z, A-Z), numbers (0-9), and a few special characters like dots (.) and underscores (_)

  • if you call an object that does not exist, R will return an error

    A
    Error in eval(expr, envir, enclos): object 'A' not found
  • object names cannot start with a number or contain spaces

    1_a <- 5
    Error: <text>:1:2: unexpected input
    1: 1_
         ^
    my a object <- 5
    Error: <text>:1:4: unexpected symbol
    1: my a
           ^

Using objects

  • Objects can be used like variables.

    # object "a" is 5, so this is equal to 5 * 2
    a * 2
    [1] 10
  • They can be used repeatedly in the same calculation.

    # using "a" multiple times in a calculation
    1 + (a - 2) * a
    [1] 16
  • The object’s value does not change when it’s used like that.

    # still the same value
    a
    [1] 5

Changing objects

To change an object, it must be assigned again.

  • using new values

    # initialize value of "a"
    a <- 5
    
    # assign a new value to "a"
    a <- 100
    a
    [1] 100
  • using the old value

    # initialize value of "a"
    a <- 5
    
    # increase the old value by 10
    a <- a + 10
    a
    [1] 15

Exercise ✏️

Image: Getty Images

The women’s 100 m world record of 10.49 seconds was set by US athlete Florece Griffith-Joyner in 1988.

  1. Store the her finishing time in an object called record.

    Solution
    record <- 10.49
  2. Compute her speed in m/s and save it as speed_ms.

    Solution
    speed_ms <- 100 / record
  3. Compute her speed in km/h and save it as speed_kmh. (Note that \(km/h = m/s * 3.6\).)

    Solution
    speed_kmh <- speed_ms * 3.6

Source

What is the source pane?

The source pane is where you create and edit scripts for data processing and analysis.

  • For a new script, click File > New File > R Script.
  • Scripts are saved as text files with .R extension.

The code in a script is only evaluated after it was sent is sent to the console.

  • copy (Ctrl + C) and paste (Ctrl + V) code into the console
  • place the cursor on a single line or select code with the mouse and click Run

Tip

The shortut for Run is Ctrl/Cmd + Enter.

Comments (#)

A comments is text that is not evaluated as code. Comments are mainly used to explain what the code does. They are also helpful to structure scripts and for debugging.

Comments are preceded by a hash # and can be placed on a line of their own or at the end of a line of code.

# This is comment on a line of its own.
10 - 5 # This is a comment at the end of a line of code.
[1] 5

Exercise ✏️

Image: Getty Images

  1. Open a new R script in RStudio.

  2. Copy your solution of the last exercise (women’s 100 m world record) into the script.

  3. Add comments to explain what each line of code does.

    Solution
    # Save the record in an object called "record".
    record <- 10.49
    
    # Compute the speed in m/s and save it as "speed_ms".
    speed_ms <- 100 / record
    
    # Convert the speed to km/h and save it as "speed_kmh".
    speed_kmh <- speed_ms * 3.6 # Note: km/h = m/s * 3.6
  4. Save your script.

  5. Run the script and check the results in the console.

Environment

What is the environment pane?

All objects created by an assignment are saved in the workspace.

  • The objects in your workspace are displayed in the environment.
  • The commands used to create objects are displayed in the history.

Tip

The environment is useful for an overview of the available objects. Moreover, you can click on some objects like data frames to view them in a spreadsheet-like viewer.

Workspace

The workspace (“global environment”) is where all objects created by an assignment are saved.

I recommend to deactivate automatic saving and restoring of the workspace because it can lead to unexpected results.

You can deactivate it in the Global Options under General - Workspace.

Output

What is the output pane?

The output pane provides access to several important features.

  • access and organize files
  • display and zoom into plots
  • view and administer packages
  • search for help and information

Packages

Packages are collections of R functions, datasets, help menus, and examples that extend the capabilities of Base R.

Install packages once with install.packages(), using quotation marks around the package name.

install.packages("foo")

Load a package every time you start a new R session with library(). No quotation marks needed.

library(foo)

Use the :: operator (e.g., dplyr::filter()) to be specific about which package a function comes from.

Functions

Objects are one fundamental thing in R. The other fundamental thing are functions. The basic syntax of a function is

function_name(option1, option2, ...).

Functions are used to perform specific tasks. For example, the function mean() computes the mean of a vector of numbers. The result can be assigned to a new object.

# Apply the sqrt() function to the number 9 and assign the result to x.
x <- sqrt(9)
x
[1] 3

Many functions are available in Base R. Additional functions are provided by packages and can be used after loading the package.

Help

The help menu provides access to the documentation of functions and packages. With the ? operator, you can search for help on a specific function.

# Get help for the function sqrt().
?sqrt

Exercise ✏️

Photo courtesy of @chuttersnap

  1. Install the package ggplot2 and load it into your current R session.

    Solution
    install.packages("ggplot2")
    library(ggplot2)
  1. Get help about the ggsave() function. What does it do?

    Solution
    ?ggsave